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To determine whether language behavior represents an 
early conditioned verbal response or whether it changes with age and 
experience was the purpose of this study which attempted to define 
unique isolates of language on the basis of actual language produced 
by young children. Tape recorded data were collected for 12 years 
from 211 children in Oakland, California. Data collected during the 
first three grades were used to define eight "language style groups" 
(research groups) and statistics recorded during grades 10-12 were 
used to assess and predict language facility and growth. To create 
the research groups, three test or rating variables (e.cr., 
intelligence test and verbal performance scores) and 15 language 
variables (e.g., "average length of communication unit") were 
utilized. The basic hypothesis--children will not change with age' 
their relative positions to each other in language behavior-— was 
supported with respect to speech conventionality but not supported 
with respect to problems of mazes (groups of words not resulting in 
meaningful communication) , It was supported with respect to fluency, 
dependent clauses, and elaboration index for students who began as 
poor users of oral English. These results have several implications 
for curriculum development, especially in the teaching of reading. 
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A MULTIVARIATE DESCRIPTION AND ANALYSIS OF 
ORAL LANGUAGE DEVELOPMENT 



Introduction and Research Hypothesis 

That human beings vary enormously in their command 
of the spoken word is a matter of common observation. Pro- 
f iciency--and even more , power--with oral language is an 
important aid to adequate, successful living, no matter by 
what values one judges. ’’Give me the right word and the 
right accent, and' I will move the world,” wrote Joseph Conrad, 
paying tribute to the power of language to influence thought, 
feeling, and action, both within oneself and in others. 

Just as adults vary in their command of the spoken 
word, so too do children, and to foster language growth, 
schools need to avoid an inflexible regimen for all pupils. 

The single textbook curriculum to which all pupils are ex- 
posed in platoon or block fashion has always been one of. the 
serious defects of American as well as world-wide education. 
Project Talent, a study of 440,000 American high school 
students, identifies the lack of effective procedures for 
individualizing instruction as the most serious defect in 
American education. To individualize instruction completely 

creates formidable problems, especially in skills requiring 
a communication situation and therefore unsuited to programmed 
instruction. However, if pupils with similar language needs 
and, difficulties could be identified, the task of organizing 
instruction could be carried out with greater precision and 
effectiveness. Determining whether or not such groups of 



2 . 



pupils exist and describing their language development are 
the major purposes of the present research. 

Of course, it is obvious that pupils 1 needs in 
language growth are not identical. In crude fashion, teachers 
can easily locate those whose speech varies for sociological 
reasons— foreign parentage or non-standard dialects such as 
Negro, Pidgin, or Appalachian. But even among those who 
speak standard informal English there are different rjieeds 
based upon psychological rather than sociological features. 

For instance, do some speakers hesitate, and in halting 
fashion, fall into word tangles whereas some others express 
themselves fluently and easily? Do some pupils, regardless 
of dialect or the use of English as a second tongue, speak in 
organized, coherent fashion whereas others are confused in 
their thinking and muddled in their syntax? Do some know, the 
full repertoire of syntactical patterns — such useful ones as 
appositives and infinitive clauses — yet seldom use them? 

Such clusters of similar language behavior, if they can be 
determined, should have important implications for planning 
more efficient and effective language instruction. 

Research in language arts includes many . investi- 
gators who focus on differences in pupils f language as one 
basis for explaining why pupils differ in their communica- 
tion of thoughts, feelings, needs,- and interests. Usually, 
the researcher describes the differences existing among chil- 
dren of different race, sex,. social class, age, and other 
demographic and social factors. Much less frequently have 

researchers tried to explain differences in terms of language 

* * / 



isolates and individual language styles though analysis in 
this direction is now beginning to emerge in the literature. 

In this report a new approach will be described, 
one : that attempts to define unique isolates of language on 
the basis of actual language produced by young children. 
Groups of children will be classified by their similarities 
in language behavior according to language variables observed 
at grades one, two, and three. On the basis of this classifi 
cation and language data generated at grades 10, 11, and 12, 
an attempt will be made to show how these unique language 
behaviors either change and evolve with age to a new adult 
language style or remain fixed over time. As a research . 
question, an attempt will be made to determine whether 
language behavior represents ah early conditioned verbal 
response of each individual or whether it is fluid, changing 
with age and experience. 

To develop analytical techniques, it will be hy- 
pothesized that language . strategies are established early 'in 
life and undergo little change as a child increases in age . 
For instance, children who use complex elaboration in speech 
during their early years will be expected to continue this 
.verbal behavior as they age, and on this language feature 

i 

they will consistently maintain their same relative distance 
from subjects who use simple, unelaborated language and do 
not markedly improve their syntactical complexity as they age 
Since .language ' is not absolutely static but does change, 
usually improving in quality and complexity as age increases, 
"it is really being hypothesized that children maintain their • 



relative positions to one another as they age even though all 
subjects will generally show an improvement in language from 
childhood to adulthood. 

The test of this hypothesis will be based on longi- 
tudinal data collected for twelve years on a selected repre- 
sentative sample of children living in Oakland, California, 
from 1953 to 1965. The present study utilizes 211 children 
whose spoken language was taperecorded at fixed yearly inter- 
vals under standardized conditions. The data from these 
tapes have been analyzed and the general characteristics of 

the variables have been discussed and reported by Loban in 

(2 3 4 ) 

several earlier monographs . ’ 5 The data collected from 

these 211 children during their first, second, and third 
grade tapings are used to define the language style groups 
that will become the research groups of this study. Statis- 
tics generated from the tenth, eleventh, and twelfth grade 
tapings are used to assess and predict language facility and 
growth. 

The statistical procedures used for this analysis 
and testing are included in the broad framework of canonical 
correlation, principal components, statistical clumping pro- 
cedures, multivariate analysis of variance, and linear dis- 
criminant analysis. However, rather than present a compre- 
hensive review of these methods at this point, their basic 
properties will be discussed and their bearing upon the basic 
research question examined as they are introduced and needed. 
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De scription of the Basic Language Variables 

During the thirteen-year longitudinal study carried 
out by Loban, a large amount of information was collected on 
the language used by each .of the subjects. From the original 
328 subjects, complete data are available for 211 (e.g., for 
some of the 328 a test or a taping session is missing.) A 
thorough analysis of all of the accumulated data could take 
the collective lifetimes of a dozen researchers. Consequently, 
one of the goals of this present investigation is to examine 
some of the basic data and to reduce the massive amount of 
information to a relatively small set. This could reveal 
whether, or not one can solidify and use as predictive measures 
the basic components of language and its development as mea- 
sured by Loban 1 s defined variables. Before undertaking such 
a study, one of the facts of research reality to be faced is 
that it might fail, but even if it does fail, much should be 
learned that would make further analysis simpler and more 
meaningful. 

To create the observational research groups in this 
present study, it was decided to begin by using, (a) three 

test or rating variables, and, Cb) a set of fifteen language. 

variables generated during the first three years of the data 
collection period. These eighteen variables for all 211 sub- 
jects are the following: 

Tests and. Ratings 

1. Mean intelligence scores as obtained from the 

Kuhlman-Anderson test of mentai ability usually administered 

/ 

several times to each subject between grades two and seven. 
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2. Mean teachers T ratings of oral language per- 
formance during the thirteen-year period of schooling. (Each 
subject had 13 or more teachers over the entire study period.) 

3. Verbal performance scores obtained from a kinder- 
garten vocabulary test given at age five. 

Language Variables Derived from Oral Language Taped Under 
Standardized Conditions 

4. Fluency score one: average length of communi- 

cation units at first grade. (A communication unit is each 
independent predication with all of its related modification. ) 

5. Fluency score two: average length of communica- 

tion units at second grade. 

6 . Fluency score three : average length of com- 

munication units at third grade. 

7 . Fluency s^core four: freedom from mazes at 

girst grade . 

8. Fluency score five: freedom from mazes at 

\ 

second grade. 

* 

9. Fluency score six:_ freed om fr^ojn mazes at 
____th-i-r dr-grade . 
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10. Dependent clause usage: ratio of dependent 

clauses to communication units at first grade. 

11. Dependent clause usage: ratio of dependent 

clauses to communication units at second grade. 

12. Dependent clause usage: ratio of dependent 

clauses to communication units at third grade. 

13. Conventionality index: success with use of 

standard English usage at first grade. 
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14. Conventionality index: success with use of 

standard English usage at second grade. 

15. Conventionality index: success with use of 

standard English usage at third grade. 

16. Elaboration index: amount of elaboration or 

complexity within, the individual communication units at first 
grade . 

17. Elaboration index: amount of elaboration or 

complexity within the individual communication units at second 
grade . 

18. Elaboration index: amount of elaboration or 

complexity within the individual communication units at third 
grade . 

Examples of these variables are shown below, with each ex- 
ample containing an extreme case at each end of the spectrum 
as well as a more average case from the center. 

Examples of All Variables 

Test and Rating Variables 

1. Kuhlman-Anderson I. Q. scores range from 65 to 
138 with a mean of 101.2 for the group of 211 subjects. 

2. Teachers f . ratings range from 1.8 to 4. -3 with 

a mean of 3.2 for the group of 211 subjects. Note that each 
subject has one teachers rating per year and the overal 
group mean is the average of cumulative means obtained on 
each subject over a thirteen-year period. The ratings were 
made by thirteen or more teachers. 

3. Kindergarten Vocabulary Test scores range from 
0 to 83 with a mean of 49.0 for the group of 211 subjects. 
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Language Variables 

Before proceeding to the development of analytic 
techniques it will be useful to define carefully the terms 
used for the language variables in this research* 

Communication Unit: A communication unit may be defined 

semantically as a group of words which cannot be further seg- 
mented (divided) without the loss of their essential meaning. 
Grammatically, a communication unit is any independent predi- 
cation and all of its relevant modification. Thus, ”1 saw a 
man wearing a red hat” is a single unit of communication; if 
"wearing a red hat” were omitted, the essential meaning of 
that unit would have been changed and grammatically the parti- 
cipial modifier of man would be missing. Furthermore, "with 
a red hat” does not constitute a complete predication and it 
cannot stand alone. However, "I saw a girl and she was wearing 
a green hat” results in two communication units: (1) "I saw a 

girl”; (2) "[and] she was wearing a green hat.” Dividing the 
sentence into two communication units does not result in loss 
of meaning to either unit* and grammatically each is an inde- 
pendent unit. The average length of these communication units 
increases with advancing age, beginning with the brief sen- 
tences of very young children and progressing to the complex 
elaborated sentences of adults. The mean for the group of 
211 subjects is 6.0*words per communication unit in grade one 
and IB,'} words per communication unit in grade twelve. Ex- 
amples of communication units used by subjects in 'the oral 
language transcripts are as follows: 




Short Units 
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Lower Grades: 



Upper Grades: 



Medium Units 



Lower Grades 
Upper Grades: 



She is outside 
(3 words) 

He is plowing. 
(3 words) 



They don't have very many clothes. 

(7 words) 

And it is just about a father and his 
tour boys . 

(11 words) 



Long Units 



Lower Grades: Or we might play some games that I have 

in my house, some games that are in a box 
-Luce that . 

(21 words) 

Upper Grades: And they're all working together to try 

1 ? et *| er husband into this high political 

?h£^ l SSt £ iK “ P for hiSSer and better' 
things and maybe to become president or 
whatever he's got his mind on. 

(40 words ) 

M aze Words as a Pe rce ntage of Total Words : A maze may be 

defined as a group of words or initial parts of words not 

resulting in a meaningful communication unit, i.e., a confused 

tangle of language not necessary to the communication unit. 

Most communication units contain no maze words whatsoever; 

thus the following examples are designed to illustrate the 

extent to which maze words can occur in a given communica- 
tion unit. 

Minor Maze Problem 

Lower Grades: [and] and it looks like a cute little dog. 

(1 maze word in a total of 9 words) 



™ ’V^ «TW!»W* « ** ""T* '* »‘ '***r*K l <" V* »w , mm* * /i 111 j n w I^i^p^ii;iflj^ i , ^ t ij> „ ^pil}^' 
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Upper Grades: I think maybe the one that is running, 

the girl that is running, [knows] appar- 
ently knows something that the other* one 
doesn f t know because she*s got sort of a 
puzzled look on her face. 

Cl maze word in a total of 36 words) 



Moderate Maze Problem 

Lower Grades: £ probably] probably [going] they T re 

going back to their house. 

(2 maze words in a total of 10 words) 



Upper Grades: So what does Trina do but tell him to 

give her the money [which his last payoff] 
his last payroll because the company 
before firing him [had] of course [given 
him] had paid him off that money which he 
had deserved. 

(7 maze words in a total of 41 words) 

Major Maze Problem 

Lower Grades: I got [one of my favorite toys a toy] 

my favorite toy in the garage. 

(7 maze words in a total of 15 words) 



Upper Gi'ades: [and then and and] and [it*s it*s very] 

it’s written effectively [so that] so 
that you think that [Leon-] Leonard* s 
going to come in [and] and sort of you 
know [r-] release [his his] his love for 
Tolson [and his] and his need for Tolson 
[in] in this kind of weird relationship. 
(19- maze words in a total of 56 words) 

Dependent Clause Ratio : A communication unit consists of an 



independent clause which may or may not be modified by one or 

\ 

more dependent clauses. Thus, "I saw a man” is an independent 
clause (as well as a complete communication unit) which may 
stand alone. One could also elaborate this with a dependent 
clause and produce ”1 saw a man who was wearing a scarlet hat” 
or. with two dependent clauses and produce ”1 saw a man who 



was wearing a scarlet hat which was made of feathers.” 
Actual examples from the oral transcripts of the subjects 
are as follows: 



No Use of Dependent Clauses 



Lower Grades: I know that. 

Upper Grades: That's all. 

Medium Use of Dependent Clauses 

Lower Grades: I don't know what that is. 

(1 dependent clause) 

Upper Grades : And it ended up the way she thought it 

would, somehow. 

(2 dependent clauses) 

Large Use of Dependent Clauses 

Lower Grades: I think they're going home after a long 

day's work because it looks like it's 
~ getting to be night because the stars are 

out. 

(4 dependent clauses) 

Upper Grades: Well it was an illustration of how a man 

can be brainwashed to the point where he 
knows nothing but what he is told and 
does what he's told to do by a special 
person who's been set aside as his con- 
troller or master, however you'd like to 
put it . 

(6 dependent clauses) 

Conventional English Usage : Standard English is defined as 

the type of language usage typically spoken in the political, 
social, economic, educational, and religious life of this 
country. This set of language habits is standard not because 
it is any more correct or capable than other varieties of 
English but rather because it is the type of English most 
frequently used in the conduct of the most important affairs 
of this country. Standard English ranges from informal to 
formal styles with many usages that are disputed or in transi- 
tion. In this research, we are concerned with obvious de- 
partures from standard English usage, such as the deviant 
forms following: 
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Lower Grades: She aint got no dress on or nothing. 

And the boy h.we a shirt on. 

She don’t know nothing. 

And he brung it over. 

And her is trying it. 

Upper Grades: And this man and the horse was plowing. 

Once upon a time there was two girls. 

And then when they move into it, Marlene 
found out that she didn’t like it because, 
it too far from school. 

And her mother and them liked it too. 

Weighted Index of Elaboration : The weighted index of elabora- 

tion assigns specific numerical weights to the component 
syntactic elements within a communication unit. Thus, a unit 
with simple adjectives and adverbs as modifiers will receive 
fewer points than a unit containing more elaborated phrases 
or clauses ; clauses or phrases embedded within other elab- 
orated structures will receive' still additional weight in 
this index. The following examples range from short, non- 
elaborated communication units to units containing a variety 
of embedded structures. 

No Elaboration 



Lower Grades: We play house. 

( 0 points ) 

Upper Grades: She was nineteen. 

(0 points) 



Medium Elaboration 

Lower Grades: On Thursdays there’s Deputy Dave again. 

(2 1/2 points) 

Upper Grades: And she’s running towards it to see 

what’s happening. 

(14 1/2 points) 



Extensive Elaboration 



Lower Grades: Well that looks like there’s an Eskimo 

travelling in a sleigh with a whole 
bunch of dogs pulling it. 

, (22 points) 
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Upper Grades: Well this isn’t a plot so much as a 

situation where we’ll say that the girl 
that’s running beneath the tree is the 
daughter of the woman who’s holding 
clothes or something in her hands. 

(27 1/2 points) 

Discussion of Variables and Reductions of Their Number 

As might be expected, these 18 variables possess 
many elements in common. This overlap of information can be 
seen by an examination of the correlation matrix or array 
reported as Table 1. In this table, the complete set of cor- 
relation coefficients for the 18 variables for the total 
group of 211 subjects is presented. Since r^ = r , onlv 
the correlations above the main diagonal of the correlation 
matrix are shown. These statistics are based on near com- 
plete data so that less than two percent of the raw data were 
estimated. Whereas the original Loban study consisted of 328 
subjects, only those 211 on whom complete data were available 
were used in the present analysis. 

Finally , it should be noted that variables corres- 
ponding to 4 through 18 measured at grades ten, eleven, and 
twelve are used to evaluate the effectiveness of the ability 
to predict later life speech patterns. These corresponding 

A * ' % s 

variables are: 

■v 19. Fluency score one: average length- of communi- 

cation units at tenth grade. 

, 20. Fluency score two: average length of communi- 

cation units at eleventh grade. 

21. Fluency score three: average length of com- 

munication units at Twelfth grade. 
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22. Fluency score four: freedom from mazes at 

tenth grade. 

23. Fluency score five: freedom from mazes at 

eleventh grade. 

24. Fluency score six: freedom from mazes at 

twelfth grade. 

25. Dependent clause usage: ratio of dependent 

clauses to communication units at tenth grade. 

26. Dependent clause usage: ratio of dependent 

clauses to communication units at eleventh grade. 

27. Dependent clause usage: ratio of dependent 

clauses to communication units at twelfth grade. 

28. Conventionality index: success, with use of 

standard English usage at tenth grade. 

29. Conventionality index: success with use of 

standard English usage . at eleventh grade . 

30. Conventionality index: success with use of 

standard English usage at twelfth grade. 

31. Elaboration index: amount of elaboration' or 

complexity within the individual communication units, at tenth 
grade. .. 

32. Elaboration index: amount of elaboration or 

complexity within the individual communication units, at 
eleventh grade, 

33. Elaboration index: amount of elaboration or 

complexity within the individual communication units , at 
twelfth grade. 

Thus, as can be seen, this study is based upon 33 
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variables measured on 211 independent subjects so that the 
total number of bits of information is given by 33 x 211 = 
6,963. From these bits of data, the hypothesis of language 
use stability will be tested and evaluated. 

Principal Component Analysis on the Language Variables at 
Grades One, Two, and Three 

The basic research hypothesis of this investigation 
is that students do not change relative to one another with 

i 

respect to their use of language as they progress from kinder- 
"to high school graduation. One way to evaluate this 
hypothesis is to examine the correlations between the language 
data generated during the first, second, and third grades with 
the corresponding information collected during the tenth , 
eleventh, and twelfth grades. Such an approach might not be 
especially fruitful, for if it were attempted, it would be 
necessary to examine the inter correlations that exist between 
five language variables generated at each of the six grades. 

addition, .the three aptitude and achievement measures 
were to.be included in this set, then the total number of 
unique measurements to be examined in the study of the inter- 
correlations increases to 5x6 +3= 33. With this many 
variables, the total... number of distinct correlations to be 

examined is given by 33 - * 32 = 528, of which 153 were reported 

in Table 1. As might be expected, very few researchers are 
able to study this number of correlations and comprehend all 
of the information contained about the underlying variables 

their intercorrelations . Thus, simplification is necessary. 

o 

ERIC 



As can be seen by an examination of the correlation 
matrix of Table 1 and the statistics reported in Table 2 on me a 
scores and standard deviations, the Loban data at grades one, 
two, and three shows remarkable, and exceptionally high, con- 
sistency. Whereas the average length of communication unit 
increases from 6.0 words at first grade to 6.9 words at third 
grade, the standard deviations remain remarkably constant: 

1.4 to 1.3 words. Furthermore, careful examination of the 
correlation coefficients of the length of communication unit 
with the other variables of the study also demonstrates that 
the correlations are quite stable. For example, the average 
correlation of the length of communication unit with freedom 
from mazes is .16 with the range in correlations extending 
from .07 to .29. Thus, excluding for the moment the aptitude 
statistics on tests and ratings, the Loban language proficiency 
data show considerable consistency. The only exception for 
the language variables is the elaboration index which shows 
a monotonic increase with grade, going from 75.4 at grade one 
to 83.4 at grade three. However, the standard deviations 
remain constant and the correlations with these variables 
remain relatively fixed and constant across grades. 

Such consistencies in variances and groupings of 
correlation coefficients suggest that the basic data contain 
a large amount of redundancy and that a reduction in data is 
possible and meaningful. As a first thought, one might feel 
that data reduction could be achieved by using only first 
grade results and ignoring and discarding the second and third 
grade statistics. Such a possibility is not necessarily 
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Table 2. Measures of Central Tendency and Varia- 
bility for the 18 Variables Relating to 
Language Usage for the 211 Subjects of the 
Study at Grades One, Two, and Three. 



Variable 




Average 


Standard 

Deviation 


1 


Vocabulary 




49.0 


15.9 


2 


Teacher rating of 


oral language 


3.2 


.6 


3 


I. Q. 




101.2 


12.8 


4 


Language fluency at first grade 


6.0 


T.4 


5 


Language fluency at second grade 


6.5 


1.4 


6 


Language fluency at third 'grade 


6.9 


1.3 


7 


Freedom from mazes 


at first grade* 


7.3 


4.0 


8 


Freedom from mazes 


at second grade* 


6 • 6 


4.0 


9 


Freedom from mazes 


at third grade* 


6.0 


3.7 


10 


Dependent clauses 


at first grade* 


17.3 


11.6 


11 


Dependent clauses 


at second grade* 


20.3 


12.5 


12- 


Dependent clauses 


at third grade* 


22.7 


15.0 


13 


Conventionality at 


first grade* 


3.9 


3.0 


14 


Conventionality at 


second grade* 


3.6 • 


3.1 


15 


Conventionality at 


third grade* 


3.2 


2.3 


16' 


Elaboration index 


at first grade 


75.4 


27.6 


17 


Elaboration index 


at second grade 


81.6 


24.8 


18 


Elaboration index 


at third grade 


89.4 


27.4 



^Original score is multiplied by 100 to place the 
variable on a scale having 1 as a lower limit. 
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without merit and could be attempted. However, intuition 
suggests that perhaps all the data could be used to identify 
the major underlying variables of the matrix of correlations. 

In addition, it would seem reasonable that a variable based 
on three observations, such as the simple average, has greater 
reliability than any one measure for any one. year. That this 
is indeed true is easy to show. 

For any one of the language variables measured over 
three years, a general form for a composite measure is given bv: 

X = a lXl + a 2 x 2 + a 3 x 3 

where a 2 , and a 3 are arbitrary weighting constants and 
x l» x 2 ts va ^* ues of the variables at grades one, 

two and three, respectively* A statistical measure of the 
efficiency of this variable as a composite measure is given by 
its variance, which in this case is defined by: 

Var(X) = a^Var(x^) + a|Var(x 2 ) + a^VarCx^) 

+ 2a 1 a 2 Cov(x 1 ,x 2 ) + 2a ] a 3 Cov(x 1 ,x 3 ) + 2a 2 a 3 Cov(x 2 ,x 3 ) 

Since the variances and correlations of the Loban data are 
almost equal by sets , it makes sense to substitute the average 
variances and correlations into this equation. When this sub- 
stitution is performed, it is seen: 

Var(X) = o 2 (a 2 + a 2 + a|) + 2oo 2 (a 1 a 2 + a^ + a 2 a 3 ). 

where a 2 “ average variance and p = average correlation. If 
a l = a 2 s a 3 = 1/3, then X is the simple average of the three 
years' testing. For this set of numerical constants: 

Var(X) = Var(x) = jo 2 + |-pcr 2 = |—(1 + 2p) 

As long as p < 1, it follows that Var(x) < a 2 , the variance of 
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the variables for any one year and is therefore a more precise 
measure. As an example, it appears that for the statistics on 
the elaboration index reported in Tables 1 and 2, a is approxi- 
mately equal- to 25, and p is approximately equal to 1/2, so 
that: 



Var(x) 




f 

1 + 2 





As this last result indicates, the variance of the average 
elaboration index is about 2/3 that of any one measure for any 
one grade level and in this sense is a much more precise mea- ' 
sure of performance. 

As this example suggests , the average value of the 
scores for three years is a better measure than any one of the 
individual years. When the correlation coefficients are ex- 
actly or nearly equal, this is indeed the best procedure for 
combining data. However, when the correlations are not ex- 
actly equal, such as in the present data, one can obtain a 
measure that is even better than the average of the three 
individual scores. This more efficient value can be found by 
Principal Component Analysis, a process which consists of 
finding the best weighted sum of the variables. The procedure 
employed to determine the best weighting coefficients is very 
similar to the process shown in the previous discussion except 
that the values a^, and a 3 are not pulled out of the air 
2 S was seemingly the case in the previous example. Instead, 
one starts with the Var(X) and chooses the weights so that 
the variance is maximized, subject to the conditions that 
a i + a 2 + a | = 1. When the variances and correlation coef- 
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ficients 
a l = a 2 



are equal, the minimizing weights are given simply by 
= a 3 = ” = *58. When this occurs, a researcher can. 



with no loss in generality, replace the individual a value by 
1/3 and thereby produce the simple average. When the correla- 
tions are not equal, these coefficients are not equal and are 
much more difficult to determine. Fortunately, if the process 
has been programmed for a high speed computer, the determina- 
tion of the optimum weights is a simple matter. So as to 
achieve this data reduction, a Principal Component Analysis 
was performed on each set of three variables shown in Table 1. 
The results of the analysis are summarized in Table 3. Since 
the values of the weights shown in Table 3 are almost all 
equal to .58, it is clear that the average value, of the vari- 

i 

ables themselves could have been used in accomplishing the 



reduction of data. 

So as to obtain uniform results, the Principal Com- 
ponent Analysis on the Loban data has beer, performed using the 
correlation matrix. This means that all variables have been 
transformed from their regular measurement scale to one with a 
mean value of zero .and a standard deviation of one , or to one 
in which the mean is 50 and the standard deviation is 10. 

Since the basic research hypothesis of the study focusses on 
relative changes and not absolute changes, such a transforma- 
tion does not affect the results. Also, since most researchers 
are used to the treatment of standardized scores in the study 
of behavioral data, it was decided to transform all scores to 
a mean of 50- and a standard deviation of 10. Essentially, 
this means that the average student in the group of 211 is the 



Table 3. 



Principal Component Weighting Factors 
for the Five Fets of Language Variables 
Measured on Grades 0ne ; Two, and Three. 



v 



Variables 


Characteristic 


Grade 

One 


4, 5, 6 




Language Fluency 


.57 


7, 8, 9 




Freedom from Mazes 


.54 


10, 11, 


12 


Dependent Clauses 


.59 


13, 14, 


15 


Conventionality 


.56 


16, 17, 


18 


Elaboration Index 


.58 



a 2 : a 3 : Percent of Vari- 

Grade Grade iance Explained by 
Two Three the Component 



CD 

LO 

« 


CO 

LD 

• 


80 


O 

CD 

• 


CO 

to 

• 


71 


.59 


.55 


63 


.59 


• 

cn 

CO 


81 


O 

CO 

• 


• 

cn 

CD 


65 



referrent to which all comparisons must be made. While this 
may limit comparisons between studies, it contributes to in- 
ternal controls in this study, and it does not hinder the anal- 
ysis of the basic research hypothesis. 

Finally , it should be noted that use could have been 
made of the Principal Component scores for the three years and 
these numbers could have been employed for further data analysis 
in place of the averages of the first three school years . This 
would certainly have produced different results, but the dif- 
ferences would have been minor. In any case, the final re- 
sults would have been virtually identical. 

It should also be noted that the average values have 
a high degree of reliability as composite measures since each 
of them explains more than 63 percent of the variability exist- 
ing in the original variable measurements. For conventionality, 
the principal component variable accounts for 81 percent of 
the total variability. For the average conventionality measure: 
Var(x) = |i(l + 2p) = [1 + 2(.7)] = = .80a 2 

so even this measure t accounts for 80 percent of the variance. 

For behavioral data, these are quite high in numerical value. 
Summary Findings Based on the Principal Component Analysis 

Examination of the sample averages, standard devia- 
tions, and correlation coefficients of the 18 variables re- 
lated to language use at grades one, two, and three suggested 
that the data contained a sufficient degree of redundency and 
communal ity of information. Because of this, a principal 
component analysis was performed on each set of language 
characteristics. It was noted that- the weighting coefficients 



for each set were almost all equal to a. = — - .58, indi- 

1 /3 

eating that a simple average could be used to represent an 
individual student’s language usage. Furthermore, the gain 
in precision obtainable from the mean score over an individual 
grade was large enough to warrant the mean score as an adequate 
measure of language performance. 

Since the principal component analysis was based on 
the correlation matrix and not on the covariances, the aver- 
aging had to be based on standardized scores. As an example, 
for elaboration index, this standardization is defined by: 



X = ~ 



> X 1 - X 1 + X 2 - X 2 + X 3 - V 



1 

3 



X n - 75.4 X 0 - 81.6 X« - 89.4 
1 + i — + _ J 



27.6 



24.8 



27.4 



Finally, to produce statistical measures that have a mean of 



50 and a standard deviation of 10, the resulting averages were 
transformed to: 

T = 50 + 10 1- 




for each language variable • Thus , the fifteen language var- 
iables were reduced to five. They are: 

• 1 . Fluency 

2. Freedom from mazes 

3. Dependent Clauses 

4 . Con ventiona 1 ity 

5 . Elaboration Index 

Canonical Analysis of the Five Reduced Language Variable s 

The data reduction based upon the principal component 
analysis at grades one, two, and three was also performed for 



the data at grades ten, eleven, and twelve. Thus, 30 bits of 
information measured on each pupil have been reduced to 10. 

These 10 bits of information can be used to make a preliminary 
investigation of the truthfulness of the basic research hypoth- 
esis that students do not change relative to one another with 
respect to their use of language as they progress from kinder- 
garten to high school graduation. 

Even though 30 bits of information have been reduced 
to 10 bits,' the number of intercorrelations for the 10 variables 
is given by 10 x ^ 5 ^5, still a relatively large number. Just 
as variables can be reduced, correlations can also be reduced. 

One way to simplify the study of a large number of 
correlations is to reduce the data by means of multivariate 
canonical correlations and canonical variantes. ^ While the 
computations involved in the determination of canonical cor- 
relations and canonical variates are extremely complex, an 
understanding of canonical correlations and what they measure 
is easy to acquire, and with the use of high speed computers 
their determination is simple. 

To help the understanding of these multivariate 
measures, consider classical multiple regression theory. 

For this model, there exists a single univariate dependent 
variable Y and a set of p independent variables X: CX 1 , 

X 2 > X p ) which relate to Y collectively. From the set of 

independent variables, a linear compound of the following form 
is constructed: * 3*^ + $ 2 X 2 + . + 3 p X p , where 6 JL , $ 2 , 

♦♦., $ p are arbitrary constants unspecified in advance of 
data collection. Next, the 3,* are determined so as to crive 

p 
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the best representation of the dependent variable Y. This 
is usually accomplished through the method of least squares in 
which the so selected have the additional property that 
they maximize the correlation coefficient between Y and L . 

The best linear compound that accomplishes this task is called 
the multiple regression equation and the correlation coeffic- 
ient between the estimated Y values and the observed Y values 
is called the multiple correlation coefficient of Y with 
X: (X^, X 25 . .., Xp) considered collectively. 

For canonical analysis, the model is taken one step 
further in that Y is no longer univariate but is allowed to 
increase in dimension so that the canonical correlation model 
starts with X: (X^, X2, . .., X^) and Y: (Y^, Y 2 > Y ) 

with q $ p and tie pair of linear compounds being given by: 

4 5 ¥i * 6 p x p 



L Y = “l Y l + a 2 Y 2 + 



+ «„ Y „ 
q q 



Next, the 3 and a are selected so as to maximize the correla* 

P q 

tion between and Ly. The resulting compounds are called 

canonical variables and the correlation coefficient between 

the compounds is called the canonical correlation. 

! 

If Y consists of only one variable, then the canon- 
ical correlation is identical to the multiple correlation and 
the single canonical variate is identical to the multiple 
-regression equation. If q £ 2, it can be shown that q dif- 
ferent sets of linear compounds |l^^, 

C (2) T ( 2 )1 f T (q) , (q)l exist for the two sets of 

fx 5 L Y J ’ * * * 5 l- X * ^Y J 

X and Y variables. These sets of variables have the added 



27 . 



property that the correlation coefficient between the sets 

is zero. Operationally, this means that if the underlying 

variables are jointly normal in form, then the information 

summarized in each canonical variable is statistically inde- 

pendent of the information contained in the rest. In addition, 

if the p canonical correlations are ordered by size from R^, 

R 9 , to R , they can be tested for statistical significance by 
^ c 

( 7 ) ' 

a test derived by Bartlett. 

The pairs of canonical variates that are retained 

following a test of statistical significance are then examined 

for meaning by a subjective evaluation of the magnitudes of 

the individual estimates of the a and 6 and the correlations 

<1 P 

of the individual parent variables with the newly manufactured 
canonical variates. On the completion of this analysis, it is 
customary to give names to the resulting canonical variates 
and use them as Hypothetical constructs for the remainder of a 
scientific discussion. This practice will be followed in this 
narrative. Hopefully, it will provide valuable insight into 
the relationships existing between the variables and also 
simplify the basic analysis of the 33 variables used in this 
study. 

As is recalled, the basic decision reached following 
the principal component analysis of the 18 language variables 
for the first, secoira, and third grade data was to reduce the 
18 variables to eight variables -by replacing each set of flu- 
ency measures, freedom maze scores, dependent clauses scores, 
conventionality measures, and elaboration indices by averages 

y 

of- the first three years data. As reported, an identical data 



'reduction scheme was also performed on the tenth * eleventh, and 
twelfth grade data. Thus, while 30 bits of language usage 
information were obtained for each subject of the study, a data 
reduction to 10 bits per subject has been performed. These 
10 bits of language information along with the three test and 
rating measures constitute the basic data of the study. 

Since correlation coefficients are invariant and 
unaffected by changes in averages and standard deviation, all 
reduced scores were further translated to a mean of 50 and 
standard deviation of 10. This means that certain analyses 
to be performed on the data must be interpreted in the light 
of this perspective. Whereas absolute differences might be of 
interest, this study focusses on relative differences. It is 
for this reason that the basic research hypothesis has been 
stated in terms of relative changes. Thus, it is possible that 
all 211 students became more conventional in their speech 
over the range of years covered by the study. This would imply 
that language changes did occur and the hypothesis of no lan- 
guage change should be rejected. However, if a student starts 
at the first, second, and third grades at 1.30 standard devia- 
tions above the average student, then the research hypothesis 

\ 

states that at the tenth, eleventh, and twelfth grade this 
same student will still be 1.30 standard deviations above the 
average student, even though the average student is .8 stan- 
dard deviations above his initial standing at grades one, two, 
and three. In this sense, no change corresponds to no change 
in relative distance and standings. This way of viewing the 
interpretations must not be forgotten as this narrative is 
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studied since it limits the kinds of interpretations that are 
justified. 

The intercorrelations for the 10 bits of language 
information is shown in Table 4. Since the correlation matrix 
is symmetrical about the main diagonal, only the upper portion 
of the matrix is presented. As can be seen, the correlation 
matrix has been partitioned into four sub-matrices of correla- 
tions. This is done to facilitate the reading of the table. 

The intercorrelations for grades one, two, and three appear in 
the upper five rows and first five columns. The intercorrela.- 
tions for grades ten, eleven, and twelve appear in the bottom 
five rows and the last five columns. The intercorrelation of 
grades one, two, and three with grades ten, eleven, and twelve 
appear in the upper five rows and last five columns. If F 1 is 
used to represent fluency at grades one, two, and three, with 
corresponding notations for the remaining variables, at the 
early ages it is seen that the correlation between fluency 

and mazes is given by r p M = -.21. At the later ages, this 

11 ■ 

same correlation has been reduced so that r„ M = .02. The 

t 2 n 2 

correlation between the fluency scores at the early and later 

grades is given by v v v = .37, while the correlation between 

l r 2 

freedom from mazes for the two time periods is given by 
r M.M, = • 43, 

x 2 

If the sub-correlation matrix for the early years 
is examined, it is seen that the fluency, dependent clauses, 
and elaboration index are strongly correlated. The correla- 
tions of fluency with dependent clauses is given by r r n = .66, 

; ^1 D 1 

between fluency and elaboration index is given by r r v = .80, 

* 1^1 
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and between dependent clauses and elaboration index is given 

t = -82. Apparently, these three variables are meas- 
i^l 

uring common elements of language usage, and logically they 
should vary in the same direction. On the other hand, it ap- 
pears that mazes and conventionality are measuring unique 
characteristics of language since the intercorrelations with 
these variables are quite low. 

Essentially, the same sort of correlations are 
noted for the later ages. The correlations of fluency with 
dependent clauses is given by r„ n = .74, between fluencv 

and elaboration index is given by r P r = .78, and between 

* 2 J b 2 

dependent clauses and elaboration index is given by r^ = .85 

d 2^2 

As noted for the early ages, freedom from mazes and conven- 
tionality show little relationship with the remaining variables 
or with each other. 

Finally, if the early data is compared with the 

later data, it is seen that only one correlation is high and 

that is the correlation of conventionality at the early ages 

with conventionality at the later ages. The value of this 

correlation is given by v r r = .75. The remaining correla- 

U 1 L 2 

tions are quite low. 

In Table 5 are shown the weights to be attached to 
the five language variables at each time period for those 
canonical variates that are statistically significant. As 
can be seen, the first set of canonical variates has a canon- 
ical correlation coefficient given by Rqjq) = -79* The 
hypothetical .constructs that possess this correlation are 
defined by: 
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Table 5. Canonical Variates and Correlations 

Between the Five Language Variables at 
Grades One , I ;o , and Three With the Five 
Language Variables at Grades Ten, Eleven 
and Twelve. 



Canonical Pair 


One 




Two 




Three 


Value of R 


.79 




.41 




• 


34 


Canonical Weights 


T 

A 


t (1) 

l y 


l (2) 

L X 


t ( 2) 

Ly 


T (3) 

A 


CO 


Fluency 


.27 


.04 


.36 


-.07 


-.69 


—1 .44 


Mazes 


.34 


.24 


1.06 


.93 


.17 


-.01 


Dependent Clauses 


.12 


.17 


-.02 


.17 


.03 


-.52 


Conventionality 


.73 


.87 


-.83 


-.44 


.42 


.46 


Elaboration Index 


-.03 


-.06 


.15 


.06 


-.38 


1.16 


Correlations with 
Canonical Variates 














Fluency 


.45 


.46 


-.02 


1 

• 

o 

cn 


-.87 


-.73 


Mazes 


.52 


.37 


.70 


.91 


.48 


.08 


• Dependent Clauses 


.48 


.50 


-.02 


.15 


-.59 


-.42 


Convent ional ity 


.94 


.96 


-.31 


-.27 


.11 


.07 


Elaboration Index 


.49 


. .43 


.01 


.12 


-.75 


-.26 
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Examination of the coefficients or weighting factors shows 
that these two hypothetical constructs are remarkably alike 
for the two different time periods. This suggests that 

and Ly are measuring the same language characteristic 
at the two time periods covered by the study. 

For the early years it is seen that the convention- 
ality has the greatest weight with B^ 15 = .73. Also, con- 

ventionality shows the greatest weight on L y with ai 1 * ■ = .87. 

*** ^2 

As can be seen by examining the correlations reported in the 
lower portion of Table 5, the correlation between and 

conventionality at grades one, two, and three is given by 
r = .94, while for the Ly ^ and conventionality at grades 
ten, eleven, and twelve, r = .96. On the other hand, it 
appears that the remaining variables contribute little to the 
two canonical variates. However, such is not the case. If 

t 

one 'examines the correlations of the two canonical variables 
with the five language variables, it is seen that convention- 
ality is not the only variable defining and The 

correlations with fluency, mazes, dependent clauses, and elab- 
oration index at the early years with Ly 1) are given respec- 
tively by .45, .52, .48, and .49, while at grades ten, eleven, 
and twelve, these correlations are given by .46, .37, .50, 
and .43. Even though the simple 'Correlations of the canonical 
variates with fluency,! mazes, and dependent clauses, and elab- 



